QBiC
These experiments investigate how splitting affects runtime, and storage usage. CPU & memory were kept at the same values if at all possible
We added more options to parallelise along the genome.
nextflow run nf-core/sarek -r 3.1.1 -profile cfc --input --outdir -c trace.config -c custom.config --nucleotides_per_second
custom.config
This runs through mapping, duplicate marking, BQSR, and QC, Variant calling for BWA & non-spark GATK implementation.
Loading all the individual samples. Print sample summary if possible (e.g. metadata sheet).
| name | nucleotides_per_second (number of intervals) | num of cpus for fastp | tower id | work sizes | trace |
|---|---|---|---|---|---|
| fastp4_intervals78 | 10001 (78) | 4, --split_fastq 10000000000 | https://cfgateway1.zdv.uni-tuebingen.de/orgs/QBiC/workspaces/cfc/watch/PE2um0F1SrwRi | yes | yes |
| fastp8_intervals40 | 70000 (40) | 8, --split_fastq 500000000 | https://cfgateway1.zdv.uni-tuebingen.de/orgs/QBiC/workspaces/cfc/watch/52L0P2tE99JYXs | yes | yes |
| fastp12 | skipped | 12, —split_fastq 100000000 | https://cfgateway1.zdv.uni-tuebingen.de/orgs/QBiC/workspaces/cfc/watch/ocENvYnNRQAGC | yes | yes |
| fastp16_intervals1 | 5000000 (1) | 16, --split_fastq 100000000 | https://cfgateway1.zdv.uni-tuebingen.de/orgs/QBiC/workspaces/cfc/watch/2uPwaXSKrcUaKq | yes | yes |
| fastp0_intervals20 | 200000 (21) | 0, --split-fastq 0 | http://cfgateway1.zdv.uni-tuebingen.de/orgs/QBiC/workspaces/cfc/watch/5iFlE6AEOimMO9 | yes | yes |
fastp <- plot_dataflow_single_process(df_max_time = merged_formatted_fastp$time,
df = merged_formatted_fastp$process,
df_storage = merged_formatted_fastp$storage,
group = "fastp",
title = "FastP",
xaxis = "# shards",
outputname = "fastp",
results_folder = results_folder)
fastp[[1]]## `summarise()` has grouped output by 'intervals', 'simple_name_combined'. You
## can override using the `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals', 'simple_name'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals', 'simple_name'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals', 'simple_name_combined'. You
## can override using the `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## `summarise()` has grouped output by 'intervals', 'simple_name'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals', 'simple_name_combined'. You
## can override using the `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals', 'simple_name'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'intervals'. You can override using the
## `.groups` argument.
## Saving 7 x 5 in image
## Saving 7 x 5 in image
paper_plot <- ggpubr::ggarrange(mapping_summary, bqsr_summary,
common.legend=FALSE, ncol=1, nrow=2, labels = c("A","B"), font.label = list(size = 20))
paper_plot_ann <- annotate_figure(paper_plot, top = text_grob("", face = "bold", size = 14))
#vc_suppl_ann_plot
ggsave(plot=paper_plot_ann, filename = paste0(results_folder,"png/poster_dataflow", ".png"), device="png", width=20, height=30, units="cm")
ggsave(plot=paper_plot_ann, filename = paste0(results_folder,"eps/poster_dataflow", ".eps"), device="eps", width=20, height=30, units="cm")## ggpattern tidyr viridisLite ggpubr kableExtra knitr
## "1.0.1" "1.3.0" "0.4.1" "0.6.0" "1.3.4" "1.42"
## cowplot ggplot2 forcats patchwork dplyr
## "1.1.1" "3.4.0" "1.0.0" "1.1.2" "1.1.0"